Last Updated: 2023-06-21
This page will house readings & reflections
My boss Andrew Vickers wants me to read 2 papers from our group each day. I will keep a record of my readings and reflections here. All articles obtained by searching my teammates’ names on Google Scholar or NCBI (Andrew Vickers, Melissa Assel, Amy Tin, Emily Vertosick, Dan Sjoberg).
Notes:
Only read up to end of results
The researchers used a statistical method called “multiple imputation with chained equations” to fill in missing data for certain variables, and then repeated their main analyses with this filled-in data to ensure their results were not significantly influenced by these missing values.
Sensitivity analyses using multiple imputation seems to be a common thing when evaluating how missingness affects analyses
Notes:
I only read up to end of results
Point of paper is to outline that family history for PCa is a risk factor for PCa, but that family history including history of any other malignancies suggestive of hereditary cancer syndrome can also be a risk factor
Appears this paper’s results section is just repeating what’s in the tables
Observer Bias: Refers to the distortion in research results that occurs when an experimenter’s expectations or preconceptions unconsciously influence the collection, interpretation, or analysis of data.
Notes:
Every sentence seems to be saying something unequivocally, and the ideas flow from one to the next, need to include this in my writing.
How to evaluate a marker: (1) Marker must be significantly different between control and disease group, (2) Marker must add information not previously known
This sentence makes sense, but I want to make it click instantaneously: Several groups have reported that PSA velocity fails to improve the specificity of PSA for biopsy, that is, it does not help the decision to biopsy men with elevated PSA.19-23
The sentence above basically says that PSAV fails to improve specificity for PSA for biopsy (doesn’t help decision to biopsy). Basically, it doesn’t maximize the TP/(TP + FP). The FPs are still left as is or worse.
Ask Andrew about this sentence: PSA velocity was not found to improve the sensitivity of PSA: using high PSA velocity as a criterion to biopsy men with low PSA would lead to many millions of additional biopsies per year, without a correspondingly important increase in the number of high-grade cancers detected.
Regarding the above, it says PSAV not found to increase sensitivity, but then it says it would lead to additional biopsies without correspondingly important increase in # high-grade cancers detected. The first part and second part don’t match, since the second part implies that sensitivity is decreased. If that is the case, why not just say that PSAV decreases sensitivity? If that isn’t the case, then the second part is wrong.
How does use of randomized trial data avoid these issues: In clinical co- horts, it might not be clear how a patient was referred, whether the PSA record is complete, or what the criteria for biopsy were, leading to problems such as verification bias. Use of randomized trial data avoids these problems.
Answer to above: Randomized trials, through their design, control for factors such as referral methods, biopsy criteria, and ensure complete record-keeping, thereby reducing biases common in clinical cohorts.
PSA Velocity should not be used to decide whether or not to do a biopsy, as it will lead to many unnecessary procedures. However, it can aid patient counselling in advanced disease stages. PSA changes (high velocity) while on treatment for advanced diease can indicate disease resistance to treatment.
Vocab
PSA Velocity: PSA velocity is a term used to describe the rate of change in PSA levels over time. Specifically, it’s a measure of how rapidly PSA levels increase within a specific period.
Receiver Operating Characteristic Curve (ROC): plotting TPR vs FPR for single classifier at a variety of thresholds
risk count method for calculating PSA: Can’t find this definition anywhere
verification bias: occurs during investigations of diagnostic test accuracy when there is a difference in testing strategy between groups of individuals, leading to differing ways of verifying the disease of interest
Standard Arms: In the context of clinical trials, “standard arms” refers to the control groups that receive the currently accepted and widely used treatment for a condition.
metastatic hormone-refractory prostate cancer: also known as castration-resistant prostate cancer (CRPC), is a form of prostate cancer that continues to progress despite the use of hormone therapy.
concordance index: a measure used in statistics to assess the predictive accuracy of a model, especially in the context of survival analysis or in scenarios where outcomes are ordered (such as risk scores). It is a generalization of the area under the Receiver Operating Characteristic (ROC) curve, which is a graphical plot that illustrates the diagnostic ability of a binary classifier system.
Notes:
4k score better at discrimination among those with higher PSAs
Not too many figures/tables, and didn’t use DCA. Wouldn’t this have been an ideal place to use it? Could compare all different models (4k, 4k + Polygenic Risk Score (PRS), PSA, fPSA, etc)
Vocab
Nested case-control study: In a nested case-control study, from a big group, you compare people who got a disease (“cases”) with similar people who didn’t (“controls”) to find differences that might explain why the disease occurred.
Notes
I never knew measuring total PSA produced an excess number of false positives
Just defining sensitivity and specificity below since I always forget their defns and which one is which
re this line in paper: ‘High-grade cancer was defined as Gleason score ≥7’, this is gleason score of 3+4 or higher, but there is debate around 3+4 being high grade. 4+3, however, is definitely high-grade
re this line: Briefly, decision curve analysis graphically illustrates the net benefit obtained by using the predictive models in a patient by assuming that the threshold probability for having all prostate cancer or highgrade prostate cancer at which a patient would opt for biopsy is informative of how the patient weighs the relative harms of a false-positive and a false-negative prediction. How does DCA take into account false negatives?
In this paper, figures/tables at the end. Seems w.r.t. format there’s leeway.
Vocab
Sensitivity: Sensitivity (also called the true positive rate, the recall, or probability of detection in some fields) measures the proportion of actual positives that are correctly identified as such. In other words, it’s a measure of how well a test correctly identifies a condition when that condition is indeed present.
Specificity: Specificity (also called the true negative rate) measures the proportion of actual negatives that are correctly identified as such. That is, it’s a measure of how well a test identifies the absence of a condition.
discrimination: This is the ability of a predictive model to differentiate between positive cases and negative cases. In other words, it’s a measure of how well the model can distinguish or ‘discriminate’ between those who have an event or condition and those who do not. A commonly used statistic for measuring discrimination is the area under the receiver operating characteristic curve (AUC-ROC). A perfect model has a discrimination of 1, while a model that performs no better than random chance has a discrimination of 0.5 when using AUC-ROC.
calibration: This refers to the agreement between predicted probabilities of an event and the observed frequencies of the event. A model is said to be well-calibrated if the probabilities it predicts correspond well with the actual outcome frequencies. For example, among patients who are all predicted to have a 30% chance of a disease, about 30% of them should actually have the disease for the model to be well-calibrated. Calibration can be assessed visually using a calibration plot or more formally using statistics like the Brier score or the Hosmer-Lemeshow test.
Notes
This is basically the older version of Melissa’s guidelines. Still skimmed through this.
Interesting: Avoid reporting sensitivity and specificity for continuous predictors or a model
Basically what I’m getting from above and guidelines in Melissa’s paper is that if you can help it, do not categorize continuous variables, and avoid reporting sensitivity/specificity for them
I read this already, but just reading again to let these guidelines sink in.
Notes
Comparing this to other papers I’ve read, it seems that there are many different kinds of papers. There are research papers that report data, analyses conducted, results, and conclusions. Then there are papers that report guidelines like this one. Given the breadth of paper paradigms, I wonder what kind of papers are not allowed.
This kind of seems like an opinion piece with suggestions.
2005 Scales et al. paper reports in the month that papers were analyzed, 71% of papers with comparative stats had at least 1 statistical flaw. Now I have to look into this paper, but I wonder what kind of flaws this paper is referring to. Would including an extra sig. fig. for a result in a table constitute an error? If so, I think saying 71% of papers contain at least 1 flaw is disingenuous.
This Scales et al. paper should be next on my list of papers to read.
I hate how pedantic statisticians can be about stats. One thing my current boss did in my initial interview with him was ask me to define a p-value. I got it wrong of course, because the true definition is ‘the probability of obtaining observed data assuming that the null hypothesis is true.’
Guidelines to read:
Guideline 3.5 is interesting: the more p-values from testings you have, the commensurately higher the probability that you falsely rejected at least one of them.
This carries with it the implicit recommendation that you should prioritize on # of questions, and for each question use only 1 p-value (guideline 3.6/7)
One thing I need to get into the habit of doing is this: when reporting p-values, be clear about which test the p-value is testing.
Typo: For instance, outcomes can be compared either side of several different cut-points, and the optimal cut-point chosen as the one associated with the smallest p-value.
Read up to and including 4.14. Too many guidelines, will come back to this later.
Vocab
umbrella trials: clin trial to test how well a new substances works on patients with single type of cancer, but different mutations/biomarkers
treatment allocation: putting patients in different treatment groups in a clinical trial
interaction term: looking at how two or more things affect something together, not just individually
Notes:
One of most common/serious stat errors: ignoring time dependency of covariates in survival analyses (guideline 4.15)
Common survival analysis techniques, such as Cox regression or the Kaplan-Meier method, assume that all covariates are known at the start of follow-up
guarantee-time bias” (also known as immortal-time bias)
landmark analysis technique is appropriate when the covariate of interest is known within a relatively short period of time after the start of follow-up
when performing a landmark analysis it is essential to report the number of subjects excluded in each group because they experienced the event of inter- est or were censored before the landmark, and the number of subjects whose covariate status changed after the landmark.
use of time-dependent covariates in Cox regression may be more appropriate than the landmark analysis approach when the value of a co- variate changes during follow-up, where there is not an obvious landmark time, or when the use of a landmark time would lead to an unacceptable number of exclu- sions
Analysis of a time-dependent covariate in a Cox regression model requires setup of a special- ized data set in a format known as counting process format, in which patients may have multiple rows of data corresponding to different time intervals.
If the changing value of a covariate over time is ignored, it is likely that the proportional hazards assumption, upon which Cox regression relies, is violated, which can lead to incorrect estimates and conclusions.
An important drawback of the time-dependent covariate approach is that it cannot be used to generate prediction of survival over time, and interpretation of the time- dependent coefficients can be difficult.6,7
Take Home points:
The use of traditional survival analysis methods such as Cox regression or Kaplan-Meier analysis require that all covariates are known at the start of follow-up and remain constant throughout the follow-up period. - Covariates whose status changes during follow-up are known as time-dependent covariates. When the time-de- pendency of covariates is ignored Kaplan-Meier esti- mates are subject to “guarantee-time bias” and the proportional hazards assumption of Cox regression may be violated. - Cox regression can accommodate a time-varying co- variate using a data set in the appropriate specialized format. But this method cannot be used to generate sur- vival estimates. - Landmark analysis is useful when the covariate of in- terest is known within a relatively short period of time after the start of follow-up. A fixed landmark time is selected and the start of follow-up and covariates status is defined at the time of the landmark. This method can be used to generate survival estimates, but requires exclusion of patients who had the event or were censored prior to the landmark time. - Analyses that do not appropriately handle time- dependent covariates violate the assumptions of the standard survival analyses and submissions that do not analyze time-dependent covariates appropri- ately will not pass statistical review at The Journal of Urology!.
Notes:
3 principles:
The title is misleading because the paper is mostly about things other than ‘innovations’. Even the ‘innovation’ they speak of is just looking at code. Nothing really innovative going on.
Notes:
Why are they using an acronym for enhanced recovery after surgery (ERAS), seems stupid
Does this REALLY need an acronym?: The postanesthesia care unit (PACU)
When they say ‘…with a multivariable logistic regression model, adjusting for age, BMI,…’, how do you adjust code-wise?
Again, exclusions in results, guess this is standard practice
This table does not seem to be generated using gtsummary, there are cells (maybe this is an option? )
Vocab
perioperative: Perioperative refers to the entire span of surgical management, including the preoperative (before surgery), intraoperative (during surgery), and postoperative (after surgery) phases.
Ketorolac: a non-steroidal anti-inflammatory drug (NSAID) commonly used in Enhanced Recovery After Surgery (ERAS) protocols, inhibits cyclooxygenase, which may lead to slower bone healing, compromised postoperative kidney function, and increased perioperative bleeding.
lumpectomy: lump removed from breast
contralateral mastectomy: A contralateral mastectomy is a preventive surgery to remove the breast not affected by cancer to reduce the risk of future breast cancer.
Notes:
What is ‘matching nested’ and what is the ratio: Case-control study with 1:3 matching nested
What is the Vickers Team’s affiliation with Sweden? Seems like Emily does a lot of work with Sweden too.
They use LOWESS for non-linear relation between PSA and outcome, could a GAM be used instead?
This exclusion, shouldn’t it be in the methods and not the results?: We excluded one case (a man who died from prostate cancer)…
Vocab
operating characteristics: Operating characteristics of a medical test refer to its sensitivity, specificity, and predictive values, which collectively indicate its ability to correctly identify those with and without the disease (GPT definition)
nested case-control design: A nested case-control study is a type of observational research within a larger cohort study that compares exposure to certain risk factors between those who develop an outcome of interest (“cases”) and a subset of participants who did not develop the outcome (“controls”).
labile: easily altered
conditional logistic regression: Conditional logistic regression is a statistical method used to examine the relationship between a binary outcome and one or more independent variables, while accounting for matching in case-control studies, like testing the association between prostate specific antigen concentrations and the occurrence of a certain condition in a prostate health study.
matching in case-control studies: Matching in case-control studies is a process where each case (individual with a condition) is paired with one or more controls (individuals without the condition) who share similar characteristics, such as age or sex, to help isolate the effect of the exposure being studied.
Lorenz Curve: In this study, the Lorenz curve is used to visually represent the distribution of prostate cancer risk across different prostate specific antigen concentrations, providing insight into how much of the total estimated risk is accounted for by different segments of the population.
Notes:
Thought the whole point of CoxPH is to use multiple predictors, but here we have ‘Separate univariable and multivariable Cox proportional hazards used four possible categorizations of Gleason scores.’ Need to look into this.
What is the difference between 3+4 & 4+3? A Gleason score 7 can represent mostly well-differentiated cancer with a lesser component of more poorly differentiated cancer (Gleason 3 + 4 = 7) or mostly poorly differentiated cancer with a smaller component of well-differentiated cancer (4 + 3 = 7).
Interesting that there are no commas in numbers above 999 (e.g. 12 345)
Typically, BCR defined as post-op PSA value ≥0.2 ng/ml
Notes:
Interesting Fact: PSA test was introduced in Sweden in the mid-90s
EDTA: Ethylenediaminetetraacetic Acid: anticoagulating agent for blood and used to remove heavy metals from blood/body
This line, have to read up on the reference for it, would like to understand how the PSA values were imputed: ‘Owing to the case-control design of the Malmö Preventive Project (three controls matched to each index case), blood was not sampled for some participants and we had to impute their PSA values, using methods previously described’
This line, have to look into how Nelson-Aalen is used to plot: ‘The Nelson-Aalen method was used to plot the cumulative hazards for these events.’
Why do binomial methods allow for obtaining confidence intervals for risk differences: ‘To obtain confidence intervals for risk differences, we used binomial methods.’
Need to really understand this line, not making sense right now: ‘Since there was greater censoring before 15 years in the Gothenburg arm, and because event rates were positively correlated with PSA levels, binomial methods bias against our hypothesis that the benefits of screening are small for men with lower PSA levels.’
This makes sense, but need it laid out mathematically for more clarity, and also the subsequent calculation for the # needed to be diagnosed: ‘We calculated the number needed to screen as the inverse of the absolute risk reduction between the screened and unscreened groups based on 15 years of follow-up.’
This paper also has results interspersed with figures/tables
Only read up till discussion (including M&R), did not read discussion
Notes:
One thing I noted was that the tables and figures are mixed in with the results section, which I did not do for my first actual project. In my project I grouped the tables and figures at the bottom.
Additionally, the tables and figures have shortner names that are more to-the-point than mine, and they don’t have repeated information such as the same exclusion criteria listed among all the tables/figures.
I noticed that, again, the methods and results all flow together to paint a picture that answers the initial research question unequivocally, which is what I need to do in my papers.
Criticisms:
Notes:
Abstract Parts:
1.
Background
2. Objective
3. Design Setting and Participants
4. Outcome measurements and statitical analysis
5. Results and
limitations
6. Conclusions
Key Words I didn’t know:
discrete choice experiments (DCE), conditional logit models, marginal rates of substitution (MRS)
Discrete Choice Experiments: A way of quantifying preferences through analyzing decisions made by participants. Participants pick between competing scenarios with different combinations of attributes from several categorical features.
Conditional Logit Models: A conditional logit model is a statistical model used to examine and model the relationship between a categorical dependent variable and one or more independent variables. This is a specific type of logit model that is particularly useful in cases where the dependent variable represents a choice, and the aim is to understand how different attributes of the choices influence the likelihood of each being selected.
The materials and methods has 4 sections: Expert Panel, DCE survey design, study participants, statistical analyses
Marginal Rates of Substitution: These portray the rates at which respondents are willing to trade their choices in one attribute for preferred levels on another attribute
All in all, throughout the paper, the research question and the methods used to answer the question remain lucid, and the results are a cogent story that can give insights into the nature of the issue and provides a normative statement based on that story.
Right now, because I don’t have a breadth or depth of knowledge in the field, I lack the foundational knowledge to judge papers as well/poorly written.